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ABSTRACT 

Due to its conceptual simplicity and its proven effectiveness in real-time detection and 
removal of radio frequency interference (RFI) from radio astronomy data, the Spectral 
Kurtosis (SK) estimator is likely to become a standard tool of a new generation of radio 
telescopes. However, the SK estimator in its original form must be developed from 
instantaneous power spectral density (PSD) estimates, and hence cannot be employed 
as an RFI excision tool downstream of the data pipeline in existing instruments where 
any time averaging is performed. In this letter, we develop a generalized estimator with 
wider applicability for both instantaneous and averaged spectral data, which extends 
its practical use to a much larger pool of radio instruments. 
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1 INTRODUCTION 

The Spe ctral Kurto s is est imator {SK) was originally pro- 
posed bv lNita et al.l (|2007l ) as a statistical tool for real-time 
radio frequency interference (RFI) detection and excision in 
a Fast Fourier Transform (FFT) radio spectrograph. The 
first spectrograph designed for SK, the Ko rean Solar Ra- 
dio Burst Locator fKSRBL: lDou et al.ll2009f ). demonstrated 
the effectiveness of the SK algorithm, but also revealed the 
need for a more accurate calculation of the theoretical RFI 
detection thr e sholds than initially proposed. Consequently, 
iNita fc Gary! l|2010ar ) derived the exact analytical expres- 
sions for the statistical moments of SK and, based on its 
first four standard mo ments, assigne d to it a Pearson Type 
IV probability curve (|Pearsonlll985r ). which was shown to 
be in very good agreement with the Monte Carlo simulated 
SK probability density function (pdf), as well as with the 
distribution derived from direct ex perimental observ ations 
made with the KSRBL instrument IjGarv et al.ll2010l ). 

As extensively described in the previous papers, what 
makes an SK spectrograph with N spectral channels distinct 
from a traditional one is the fact that it accumulates not 
only a set of Af instantaneous power spectral density (PSD) 
estimates, denoted Si, but also the squared spectral power 
denoted 5*2. These sums, which have an implicit dependence 
on frequency channel fk, are used to compute the averaged 
power spectrum (P) = Si/M, as well as the quantity 
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which is a cumulant-based estimator of the spectral variabil- 
ity corresponding to the signal parent population. 
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where fik and a^ are the frequency-dependent PSD pop- 
ulation means and variances, respectively. For a normally 
distributed time domain signal, i.e. an RFI-free signal, 
[Nita & Gary (2010a) showed that the estimator given by 
equation ([T]) is unbiased, i.e. E{SK) = V^ = 1. However, 
this distinctive feature of an SK spectrograph prevents the 
employment of SK as an RFI excision algorithm in an al- 
ready existing instrument that is hardware limited to output 
only an averaged power spectrum, (P) , without any possibil- 
ity to intercept the instantaneous spectra needed to accumu- 
late S2 used in SK. To overcome this hardware limitation, 
we introduce in this letter a generalized SK estimator defined 
in terms of the sums Si — T,{P)n and S2 = Tj{P)%, where 
{P)n ~ Tj^iPi/N represents the averaged power spectrum 
over an arbitrary number A'^ of instantaneous FFT spectra, 
which is the spectrograph output, while the outer sums are 
taken over M such consecutive outputs. This new general- 
ized SK reduces to equation ([1} when A'^ = 1. 



2 A GENERALIZATION OF THE SPECTRAL 
KURTOSIS ESTIMATOR 

The generalization of the estimator SK may b e achieved di- 
rectly f rom a fundamental property proven bv lNita fc Garvl 
l|2010ah . which pertains to any gamma distribution 



f(x,a,d) 



2'ir{d) 



(3) 



2 G. M. Nita and D. E. Gary 



where r(2:) = ^^ t^~^e~^dt is the well known Euler's 
Gamma function. This property, which is the basis of the 
whole SK concept, may be stated as follows 

Theorem 2.1. Given a set of M independent random vari- 
ables {xi} sampled from a parent population described by 
a gamma distribution f{x,a,d), the infinite set of statisti- 
cal moments of the ratio MS2/SI, where ^i = J^ftiXi and 
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(¥)' 



are given by the expression 
M"r(Md) 






(i) 



which is independent of the mean of the underlying distri- 
bution. 



In IjNita fc Garvll2010al ) we proved this property in the par- 
ticular cases d = 1 (exponential distribution) and d = 1/2 
(x distribution). However, following the same exact steps, 
it may be shown that Theor em [2Tn generally holds for arbi- 
trar y values of d llNita fc Ga ry 2010b). 

iNita fc Gary! ( 2010al ) used the particular forms of equa- 
tion ([4} corresponding to d = 1 to derive the SK estimator 
given by equation([T]), and to d = 1/2 to derive a time do- 
main kurtosis (TDK) estimator 



- M + 2fMS2 ; 



(5) 



both of them being unbiased estimators of the spectral vari- 
ability corresponding to the underlying probability distribu- 
tion f{x,a,d), which according to equation ([2]) is 



V^ = E{x^)/Eixy 



l/d, 



(6) 



which is 1 for d = 1, and 2 for d = 1/2. 

However, since the final goal of this study is to provide 
a generalized SK that would work for arbitrary distribution 
functions f{x,a,d), we find at this point useful to define it 
in a form that would provide an unbiased estimation of the 
normalized spectral variability V^d = 1, rather than V^. 
This transformation gives the generalized estimator a fixed 
expectation for any d, while leaving its statistical properties 
unchanged, thus preserving its performance as a statistical 
detector of outliers. 

Therefore, considering the result given by equation Q, 
we define the generalized SK estimator as 
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which, for any observable x sampled from a gamma distri- 
bution f{x, a, d), is an unbiased estimator of the normalized 
variability V^d, i.e. E(SK) = 1. Note that, for d = 1, SA" 
reduces to the expression given by equation ^, while for 
d = 1/2, the original TDK estimator has to be modified 
according to equation ([7]). 

Getting back to the original motivation behind this 
study, i.e. the problem of adapting our original RFI excision 
algorithm to a spectrograph that is hardware constrained to 
output only power estimates already averaged over A*' on- 
board accumulations, we consider the case of having as the 
only available observable the mean {x)n ~ {l/N)'EjL-^Xi. 
Since the probability distribution of the mean of A^ inde- 



pendent random variables sampled from a gamma distribu- 
tion is also a gamma distribution given by /((x)jv, a/N, Nd), 
(JNita fc Garvll2010al '). it immediately follows that the mean 
(a:) satisfies the condition required by Theorem 12. II and its 
associated SK estimator may be defined according to equa- 
tion 0, where Si = Ef£i((a;)jv),, 5*2 = Ef£i((a:)iv)?, and 
d — >■ Nd. Moreover, since {x)n ~ {l/N)Y,f-iXj, the sums 
entering equation ((7|) may be simply replaced by the double 
sums Si = Ef£i(Ej^ia;j), and S2 = T.fU{T,^^iXj)l 

Hence, we obtain the generalized definition of the Spec- 
tral Kurtosis estimator, which we state as follows: 

Corollary 2.2. Given a set oi M x N independent random 
variables {a;^} sampled from a parent population described 
by a gamma distribution f{x,a,d), the expression 
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where Si = T.tii(T,f^iXj), and S2 = Ef£i(Ef^ia;j)?, is an 
unbiased estimator that, in the absence of any outlier con- 
tamination, is expected to evaluate to unity independently 
of the particular values of the parameters involved. 



3 THE PDF OF THE GENERALIZED 
SPECTRAL KURTOSIS ESTIMATOR 

Since the expectations E[{SK)"] may be expressed in terms 
of the expectations E[{AIS2/Si)"] given by equation Q 
amended by the substitution d — )■ Nd, all statistical mo- 
ments of SK are analytically defined, which in principle im- 
plies that its pdf may be uniquely determined from an infi- 
nit e set of moments. However, it was experimentally shown 
bv lGarv et al.l (|2010f ) that an approximation of the true SK 
pdf based only on the first four statistical moments is suf- 
ficiently accurate to estimate the thresholds needed to flag 
RFI outliers with a predefined confidence level. We follow 
the same approach here. 

Using the standard notations /I'l — E{SK) = 1 and 
Hn = E[{SK — At'i)"], the mean and first central moments of 
'SK are 
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M3 = 



M4 



2Nd{Nd + l)M^r{MNd + 2) 

{M -l)r{MNd + 4) 
8Nd{Nd + l)M^T{MNd + 2) 

(M-l)2r(MiVd-(-6) 
X {{Nd + A)MNd - 5Afd - 2) 
12Nd{Nd + l)M^T{MNd + 2) 

(M-l)3r(MiVd-h8) 
X {M'-^N^d'^ + 2,M^N^d'^ + M^N^d^ + 68M^N^d^ 



-93MN^d^ + 125M^7V^d^ 



245MiV^d^ 



+84iV^d^ - 32MNd + 48Nd + 24), 

which also provide the standard statistical parameters /3i = 
Ma 7/^2 and /32 = ^'4/^2 that are directly related to the more 
commonly used skewness, 71 = y/^ and kurtosis excess, 
72 = /32 — 3. 
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A series expansion of its first standard moments, 
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(O (x) is the Baclimann-Landau notation meaning "of order 
x") is useful in assessing the influence on the shape of the SK 
distribution due to large accumulation numbers M and A'^. 
These expressions show that, although the inner accumula- 
tion over a large number of samples A'^ reduces the variance, 
skewness, and kurtosis excess, these parameters are lower 
bounded by the intrinsic limits ^2 = 2/M, 71 = 2\/2/\/M, 
and 72 = 12/M dictated by the outer accumulation num- 
ber M. Particularly, for any N , the skewness of SK does 
not vanish faster than 2\/2/\/M, which indicates that the 
asymmetry of its pdf should be considered in calculating the 
RFI thresholds even for fairly large accumulation numbers 
M. Moreover, equation (|10|) also indicates that for excessive 
averaging, i.e. large A'^, the shape of the SK pdf becomes 
practically independent of d, which means that the estima- 
tor loses its ability to distinguish outliers that would belong 
to a gamma, or gamma-like distribution characterized by a 
different parameter d. Thus, the performance of SK is im- 
proved by limiting A'^, optimally to A" = 1. 

To find the appropriate pdf approximation type , one has 
to investigate the behavior of the Pearson criterion (|Pearsonl 
ll985l : lKendall fc Stuartlll958l ) 



/?i(/?2+3)2 
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Figure [T] shows the k, diagram obtained for different com- 
binations of the accumulation number M and the product 
Nd. The diagram shows two main regions labeled Type IV 
(0 < K < 1) and Type III (k > 1), as well as a narrow re- 
gion (black) that corresponds to a Pearson Type I (k < 0) 
pdf. The horizontal dotted line indicate the values of A' (14 
for d = 1 and 28 for d = 1/2) above which it can be ana- 
lytically proven, by evaluating the limit of k for M —>■ 00, 
that the Person Type IV condition is not satisfied for any 
value of M. Since the case of the Type IV pdf, which ex- 
actly matches the first four mome nts of the true SK pdf, 
was extensively addressed by Nita fc Garvl (l2010a|), we re- 
fer the reader to our previous work for details regarding the 
use of this particular type for computing the RFI detection 
thresholds. We consider the narrow region corresponding to 
a Type I approximation to be out of the range of interest 
for practical appli cations, but refer the interested reader to 
iKendall fc StuartI l|l958l ) for a detailed analysis connecting 
the moments of the true distribution to parameters defin- 
ing the Type I pdf. Nevertheless, since the parameter region 
satisfying the conditions required by the Type VI pdf is the 
most relevant in connection to the generalization of the SK 
estimator, we address it in the next section. By doing so we 
also extend the study of the statistical properties of SK for 
A'^ = 1 to 5/d < M < 23/d, which was not addressed in our 
original study. 
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Figure 1. The k diagram obtained for different combinations of 
M and Nd. The black region corresponds to k < (Type I), the 
gray region to < k < 1 (Type IV), and the white region to 
K > 1 (Type VI). The horizontal dotted line indicate the values 
of Nd above which the condition required by a Pearson Type IV 
pdf is not satisfied for any value of M . 



3.1 The Pearson Type VI PDF 

The Pearson Type VI pdf is a beta distribution of second 
kind (iK endall fc StuartI Il958l l. which is commonly defined 
on the [0, 00) interval as 



p{x,a,l3) 



r(Q + /?) 



r(a)r(/3) (l-ha;)"+/5' 



(12) 



where a and /3 are two adjustable parameters that determine 
its shape. If 5R(q) > and K(/3) > n > 0, the distribution 
satisfies the normalization condition j^ p{x,oi, j3)dx — 1, 
and its mean and central moments to order n exist. Under 
these minimal conditions, the mean and central moments to 
order 4 of the beta distribution are 
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In order to reproduce the shape of the SK pdf, we equate 
the second and the third central moments given by equation 
(|13|l with the corresponding moments of the SK pdf given 
by equation ((9]) to obtain 



\ \2,2^ll 
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The condition /xi = 1 can be then satisfied by introducing 
the translation parameter 5 = (/? — a — l)/(/3 — 1), which 
changes the support of the beta distribution to [5, 00), with- 
out changing its central moments. 

Therefore, the distribution p{x—S, a, /3) exactly matches 
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the mean, variance, and kurtosis of the SK distribution, 
and the RFI probability of false alarm (pfa) corresponding 
to a threshold located at the ordinate ^ can be estimated 
by using the cumulative function CF(^) = J p(x, a, I3)dx 
or, alternatively, the complementary cumulative function, 
CCF(^) = J°^gp{x,a,j3)dx. These probabilities are given by 



c.(«^n^«. 



5)°'F{a,a + l3,a + l,5-0 
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and 

CCF(0 = ^^^^y ^ (g - S)-^F (^I3,a + p,l3 + l,j^], (16) 

where 

F{a,b, c, z) 



1 °° 



r(a + w)r(& + n) g" 
r(a)r(6) ^g r(c + n) 7^ 



is the regularized Gauss hypergeometric series, which is ab- 
solutely convergent for finite parameters o, 6 and c, if jz] < 1, 
conditions that are automatically satisfied either by ^ — 5 
or by l/(^ — (5), which enter as hypergeometric function 
arguments in equations (|15p and (|16|l . respectively. How- 
ever, as previously s hown in the case of the Type IV pdf 
l|Nita fc Garvl l2010ah , which also requires hypergeometric 
series evaluations, to compute the pfa with sufficient accu- 
racy for practical applications, a simple numerical integra- 
tion of the SK pdf may be considered as a viable alterna- 
tive to using the analytical expressions given by equations 
(fT5TT6l) . 




1000 2000 3000 4000 5000 6000 
M 

Figure 2. The 0.1% and 0.5% contours of the error (absolute 
values) of the fourth central moment of the Type VI pdf approx- 
imation relative to the exact moment given by equation JOj are 
shown by solid lines. The fourth moment error contours (absolute 
values) corresponding to the Type III approximation are shown 
by dotted lines. 



value of the criterion k, may provide an accurate approxi- 
mation in the case of large positive values of k, that results 
from small values in the denominator of eauation (|lip . The 
motivation behind this investigation is that the Type III 
pdf is the more convenient gamma distribution, also able 
to pr ovide a three — moment based approximation l|Pearsonl 
Il985h . 



3.2 Accuracy of the Pearson Type VI 
approximat ion 

Since the Type VI approximation exactly matches only the 
mean, variance and kurtosis of the true SK pdf, we test the 
accuracy of this approximation by evaluating the error of 
the fourth moment p4 provided by equation (|13p relative to 
the exact moment provided by equation Q. We display in 
Figure [2] a set of contour lines indicating, as absolute values, 
the relative error levels of 0.1% and 0.5% obtained over an 
extended range spanned by M and Nd. The gray shaded re- 
gion corresponds to the range of (M, A'^, d) parameters that 
allows a Pearson Type IV approximation (k < 1). However, 
the Type VI relative error contours also extend over this 
region just because, while the condition (k > 1) prevents a 
four-moment Type IV approximation in the white region, 
the (k < 1) condition still allows a three-moment Type VI 
approximation in the gray shaded region. In our opinion, 
the result of this analysis is remarkable not only due to the 
fact the Type VI approximation reproduces the fourth mo- 
ment of the true SK distribution with a relative error much 
smaller than 0.1% over most of its parameter range, but 
also, starting with a not too large accumulation number A*', 
in the region in which a Pearson Type IV approximation 
is possible. Therefore, guided by the practical goal of es- 
timating with reasonable accuracy the tail probabilities of 
the SK pdf, we seek a more convenient approximation to 
the SK pdf that, over the range of interest for practical ap- 
plications, would provide similar accuracy to the Type IV 
or Type VI approximations. A good candidate is the Type 
III pdf that, although strictly corresponding to an infinite 



3.3 The Pearson Type III approximation of the 
SK distribution 

As in the case of the Type VI pdf, in order to match the 
mean /ii = 1 of the SK distribution, we introduce the loca- 
tion parameter 5 and define the Type III pdf as. 



p{x,a,P,S) = 



{x~Sf-h 



(17) 



which is the gamma distribution f{x — 5, a, (3) having the 
first four moments given by 

Ai'i =aP + S- fi2 = o?p- 113 = 2a^/3; ^4 = ia P{P + 2). (18) 

Hence, the first three relationships provide 



Mi 



/^3 



(19) 



and replacing 112 and ^3 by the expressions ® we complete 
the solution of a Pearson Type HI approximation for the 
5^ pdf. 

The Pearson Type HI CF is 



CF{i,a,p,S)^VAp 



01 



'r(/?). 



(20) 



where ri,(/3,a::) — J^ r~^e~^dt is the incomplete gamma 
function. 

To investigate the accuracy of this approximation, we 
compute the relative error of the fourth moment given by 
equation (|18p . relative to the exact moment of the SK dis- 
tribution. Using equation (|19|) . this error can be expressed 



(21) 
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Table 1 . Threshold solutions obtained from different pdf approx- 
imations for N = W, M = 300, and d = 1 



Method 



Lower Upper e^^ (%) 



xLo-wer 
Pfa 



(%) 



cUpper 

pfa 



(%) 



Type IV 0.76613 1.28345 ODD 

Type VI 0.76648 1.28313 0.18 0.002 0.001 

Type III 0.76754 1.28212 -0.71 0.009 0.006 

Normal 0.74288 1.25712 -100 -0.093 0.162 



which compared with the criterion k given by ((TJ shows 
that, indeed, while a perfect match of fj,f^ would result in 
an infinite value of k, the Type III approximation may still 
reproduce the SK distribution with reasonable accuracy if 
a favorable combination of parameters is met. 

The dotted lines in Figure [2] show, as absolute val- 
ues, the Type III error contours for Nd G [1,30] and 
M G [2,6000]. This figure suggests that the three- moment 
Type III approximation might be accurate enough to allow 
a uniform approach over most of the parameter space rele- 
vant for practical applications, with the benefit of allowing 
the use of the more convenient equation (|20|) for estimating 
the pfa of the SK pdf. 

As a more quantitative argument in support of this as- 
sertion, we present in Table [T] the threshold solutions ob- 
tained for d = 1, A*' = 10, and M = 300, based on the 
numerical evaluation of the analytical expressions of the 
CF and CCF corresponding to the Type IV (equation (61), 
iNitafc Gar"^ [2010a'). Type VI (equations |15I16| ) and Type 
III (equation [20]) approximations. The first two columns 
display the lower and upper thresholds, the third column dis- 
plays the error in estimating the exact /J4 moment, while the 
last two columns display the difference between the Type IV 
pfa corresponding to the computed thresholds and the pre- 
defined target pfa, which was chosen to be 0.13499% at both 
ends of the distribution, the same as a 3a" standard error. 
For comparison, we show in the last row the results corre- 
sponding to the symmetrical thresholds that result from a 
normal pdf approximation. 

These results show that in this particular case, which 
corresponds to e^^ < 1%, the supplemental data loss that 
results from adopting the thresholds estimated based on the 
Type III approximation does not exceed the Type IV pfa 
(the best available estimate) by more than 0.01%. Therefore, 
based on the results shown in Figured we conclude that the 
Type III approximation would also provide similar accuracy 
over most of the range of interest for practical applications. 
On the other hand, we point out that ignoring the intrinsic 
skewness of the SK estimator, the symmetric thresholds of 
1 ± 3y/Ji2 would result in more significant data loss, as well 
as in a reduced performance in detecting certain types of 
RFI, as confirmed by the experimental results reported by 
iGarv et al.1 ()2010[ ). 

We have tested the algorithm on both simulated data 
and solar observations (from KSRBL), and find excellent 
agreement with the theoretical pdf for the RFI-free case, 
with various choices of M and A''. Space limitations in this 
Letter preclude detailed discussion. However, we note that 
the use of GSK places significant constraints on the instru- 
mental stability over the outer accumulation Af, since the 
theory assumes stationarity in the statistical behavior of the 



Gaussian noise. Even subtle fluctuations in gain, for exam- 
ple, cause a shift of the pdf mean above unity, resulting in 
excess flagging of RFI-free data (i.e. increase in pfa). For 
instruments with a long integration time r, the product Mr 

-can become quite long (seconds), during which the system 
temperature (Tsys) and gain must be constant for RFI-free 
data channels. Of course, Tsys variations in freque ncy, and on 
time s cales long compared to Mr, are permitted. iNita et al.l 

_| 2007l ) discuss the effects and limitations of accumulating 
during times of changing Tsys- 



4 CONCLUSIONS 

We have defined a generalized Spectral Kurtosis estimator 
that allows the option of using already averaged spectra as 
input for th e SK-based RFI excision algorithm originally 
proposed by iNita et al.l (|2007l ). We derived the exact an- 
alytical expressions providing its infinite set of statistical 
moments and we identified two main regions of the param- 
eter space in which SK pdf may be best approximated by a 
Pearson Type IV or Type VI pdf. However, we showed that a 
simple three-moment Pearson Type HI approximation may 
be accurate enough over most of the parameter space to jus- 
tify its use as a less computationally demanding alternative 
to the other approximations investigated. The generalized 
SK is applicable to either instantaneous or averaged data, 
and has the advantage of allowing its straightforward im- 
plementation as a RFI detection and excision tool in the 
data pipeline of already existing instruments, independently 
of the method used to obtain the PSD estimates, whether 
from FFT or narrow band time domain power detectors. 

Use of the generalized SK is likely to be superior to 
other means of detecting RFI in averaged data based on 
simple power-level thresholds because it is always 1 for RFI- 
free data, and hence eliminates the need to determine the 
mean background power level. However, the performance 
of SK is improved by limiting A'^, and newly designed in- 
struments should optimally accumulate sums of power and 
power-squ ared on instantane ous spectra (i.e. A*' = 1), as de- 
scribed in lGarv et all (|2010l '). 
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